(3> 


INTERACTIVE  AIDS  FOR 
CARTOGRAPHY  AND 
PHOTO  INTERPRETATION 


Semiannual  Technical  Report 


Covering  the  Period  1 2 May  1 977  through  1 1 November  1 977 


December  1 977 


By:  Harry  G.  Barrow,  Principai  Investigator 
(41 5)  326-6200.  Extension  5089 


Sponsored  by: 

Defense  Advanced  Research  Projects  Agency 
1 400  Wilson  Boulevard 
Arlington,  Virginia  22209 


Contract  DAAG29-76-C-0057 


ARPA  Order  No.  2894-5 


Program  Element  61 1 01 E 


Approved  for  public  release;  distribution  unlimited 


SRI  International 
333  Ravenswood  Avenue 
Menlo  Park,  California  94025 
(415)  326-6200 
Cable:  SRI  INTL  MPK 
TWX:  91 0-373-1  246 


D D C 

II3.pnn.nii 

FEB  8 I9T8 


« B 


QC 

CO 


y^TERACTIVE  AIDS  FOR 

, I Cartography  AND 

I^HOTO  INTERPRETATION. 


Semiannuat7echnical^ep«rt 


Semian 


Covering  the  Period  1 2 May  1 977  through  1 1 November  1 977 


esl»77j 


m 


Byj  Harry  G/Barrow^rincipal  Investigator 
(415)326-6200.  Extension  5089 


Sponsored  by: 

Defense  Advanced  Research  Projects  Agency 
1 400  Wilson  Boulevard 
Arlington,  Virginia  22209 


' i-eeggmmsr.  DAAG29-7”6-c'-j0ef5^|  Effective  Date:  1 2 May  1 976 

) -^1  y/|j(%RPA  OrderJJC  2894#  j Expiration  Date:  9 October  1978 

Program  Element  61 101 E Amount  of  Contract:  $521,363 


oRI  Projeet  5300  ^ 


The  views  and  conclusions  contained  in  this  document  are  those  of  the  authors  and 
should  not  be  interpreted  as  necessarily  representing  the  official  policies,  either  ex- 
pressed or  implied,  of  the  Defense  Advanced  Research  Projects  Agency  or  the  U.S. 
Government. 

Approved  for  public  release;  distribution  unlimited 


Approved: 

Peter  E.  Hart,  Director 
Artificial  Intelligence  Center 

Earle  D.  Jones,  Executive  Director 
Information  Science  and  Engineering  Division 


ill  I T'  'i 


ABSTRACT 


3 The  ARPA  Imaste  Understanding  Pro.lect  at  SRI  has  the  scientific  goal 
of  investigating  and  developing  vrays  in  vdiich  diverse  sources  of 
knowledge  may  be  used  to  interpret  images  automatically.  The  research 
is  focused  on  the  specific  problems  entailed  in  interpreting  aerial 
photograohs  for  cartographic  and  intelligence  purposes.  A key  concept 
is  the  use  of  a generalized  digital  map  to  guide  the  process  of  image 
interpretation.  ' 

the  past  six  months,  the  components  developed  under  this  project 
have  been  integrated  into  a single  system,  known  as  Hawkeye.  The  system 
includes  modules  for  handling  a large  knowledge  base  in  the  form  of  a 
semantic  net  and  terrain  data  files  for  using  a raster-scan  displav  and 

graphics  tablets,  for  running  task-specific  subsystems,  and  for  " 

communicating  with  the  user  via  graphics  and  in  English.^  With  the 
Hawkeve  svstan.  a user  (or  a program)  can  retrieve  and  display  digital 
images,  calibrate  them  in  terms  of  world  coordinates,  make  measurements 
from  them  and  ask  questions  and  give  commands  in  English  such  as:  What 
is  this  road?.  Show  me  the  Transamerica  building,  or  How  many  Photos  are 
there  of  Alameda?  These  capabilities  are  fundamental,  both  for 
Interactive  and  fully  automated  image  analysis. 


1 ACCESSION  for  i. 

NTIS 

Wl.l'e  S. 

■-■cn 

t 

DDC 

B>;fi  ii: 

UNANNOUNCED 

L...  ‘ 

JUSTIFICATION  

— 1 

nn. 

4 

1 J 

1 DLst.  AVAIL 

SP..C1AL 

/I 

ii 


ILLUSTRATIONS 


1.  The  Hawkeye  Work  Station 3 

2.  System  OrKanlzation  9 

3.  Defining  a Display  Window 15 

4.  An  Enlarged  Inset  16 

V 5.  Multiple  Display  Windows  17 

‘c" 

f 6.  A uses  Map  of  the  San  Francisco  Bav  Area 23 

|,  7.  Display  of  the  Digital  Mao  Data  Base 24 

l‘  8.  Part  of  the  Mao  at  Higher  Resolution 25 


8.  Part  of  the  Mao  at  Higher  Resolution 25 


ACKNOWLEDGMENTS 


This  research  was  supported  by  the  Defense  Advanced  Research 
Projects  Aaencv  of  the  Deoartment  of  Defense  and  was  monitored  bv  the 
U.S.  Army  Research  Office  under  Contract  No.  DAAG29-76-C-0057. 

The  work  reported  here  was  carried  out  by  H.G.  Barrow.  R.C.  Bolles. 
T.D.  Garvev.  J.H.  Kremers.  K.  Lantz.*  j.M.  Tenenbaum.  and  H.C.  Wolf. 


* VisitinR  from  the  Deoartment  of  Computer  Science.  University  of 
Rochester. 


Vi 


I INTRODUCTION  ) 

A.  Overview  of  the  Project  ! 

This  report  describes  the  ongoing  ARPA  Imave  Understanding  Project  j 

at  SRI.  The  central  goal  of  this  project  is  to  investigate  and  develop  ; 

wavs  in  which  diverse  sources  of  knowledge  may  be  brought  to  bear  on  the  j 

problem  of  interpreting  images.  The  research  is  focused  on  the  specific  j 

problems  of  interpreting  aerial  photographs  for  cartographic  or  i 

intelligence  purposes.  Up  to  a certain  point,  cartography  and  photo  \ 

interpretation  overlap  considerably:  In  making  a mao  from  aerial  ! 

imagery,  the  images  must  first  be  interpreted;  in  interpreting  imagery.  1 

a map  of  the  area  is  a yery  valuable  tool.  Both  classes  of  acti^'itv  | 

demand  a basic  level  of  image  handling  and  mao  knowledge,  much  of  which  j 

can  be  successfullv  automated. 

A key  concept  in  our  work  is  the  use  of  a generalized  digital  map 
to  guide  the  process  of  image  interpretation.  This  mao  is  actually  a 
data  base  containing  generic  descriptions  of  objects  and  situations, 
available  imagerv.  and  techniques,  in  addition  to  topographical  and 
cultural  information  found  in  conventional  maos.  The  data  base  can  be 
used  simply  as  a source  of  specific  information  about  an  area,  as  a 
source  of  general  information  about  relationships  and  characteristics  of 
classes  of  object,  or  in  conjunction  with  imagery  to  answer  questions 
about  dynamically  yarying  facts. 

We  recognize  that  it  is  not  possible  to  replace  a skilled  photo 
interpreter  within  the  limitations  of  the  current  state  of  image 
understanding.  It  is  possible,  howeyer.  to  facilitate  his  or  her  work 
greatly  by  proyiding  a number  of  collaboratiye  aids  that  alleyiate  the 
more  mundane  and  tedious  chores  [1].*  Our  aims,  therefore,  haye  been: 


* References  are  listed  at  the  end  of  the  report. 


1 


• To  identify  specific  ways  in  which  computers  can  be  of 
assistance 

• To  develop  interactive  or  automatic  proKrams  that 
demonstrate  feasibility 

• To  inteRrate  the  programs  into  a coherent  framework  that 
can  evolve  toward  the  goal  of  automated  photo 
interpretation  as  more  advanced  capabilities  are  developed 
and  incorporated. 

Our  earlier  work,  described  in  previous  progress  reports  [1]  [2] 

[3]  concentrated  on  the  first  two  goals.  We  have  demonstrated  useful 
programs  for  a number  of  common  tedious  or  arduous  tasks,  including 
monitoring,  measuring,  counting,  and  tracing.  The  demonstration 
programs  were  independent,  however,  sharing  onlv  data  files.  We  are  now 
integrating  basic  and  more  advanced  capabilities  into  a single  coherent 
svstem.  known  as  Hawkeye. 

B.  Overv lew  af  Hawke ve  Svstem 

Hawkeye  is  intended  to  provide  a complete  environment  for  a wide 
variety  of  photo  interpretive  or  cartographic  situations.  Like  PACER 
[1],  it  includes  image-handling  facilities,  mao  display  facilities,  and 
an  on-line  data  base  of  intelligence  information.  In  Hawkeye,  however, 
these  facilities  are  all  on  line  and  tightly  integrated.  Many  basic 
photo  interpretation  functions  can  be  perfomed  interactively,  on  images 
and  maps  in  both  printed  and  digital  form.  In  addition,  some  selected 
higher-level  functions  can  be  performed  automatically  by  modules  that 
exploit  knowledge  in  the  data  base  to  guide  image  interpretation. 

The  working  area,  shown  in  Figure  1.  contains  a video  terminal  for 
communicating  in  conventional  fashion  with  the  computer,  a color  displav 
monitor  with  a trackball  for  presenting  digitized  pictures  and  maps,  and 
a digitizing  table  and  cursor  for  using  pictures  and  maps  in  document 
form.  Users  communicate  with  Hawkeye  naturallv.  in  free-form  English 
and  via  interactive  graphics. 

The  system  organization  is  modular,  the  primary  components  being 
display  console  and  digitizing  table  utilities,  a generalized  map  data 


2 


base,  an  imaKe  library,  general  imaRe-analysis  routines,  task  soecialist 
routines,  and  the  necessary  user  interface.  Table  1 outlines  the 
function  of  the  yarious  modules  currently  in  use. 

The  following  scenario  illustrates  the  capabilities  that  will  be 
ayailable  when  integration  of  the  facilities  we  haye  already 
demonstrated  is  comolete. 

A user  enucrxnR  Hawkeye  encounters  a task  aueue  containing  requests 
(from  hypothetical  analysts).  RetrieyinR  a task,  the  user  can  query  the 
data  base  to  determine  the  best  ayailable  photo  coyerage.  That  image 
can  be  retreiyed  and  displayed  and,  if  the  image  has  been  calibrated, 
selected  features  from  a digital  mao  can  be  oyerlayed  uoon  it.  Releyant 
reference  documents  in  hard  copy  form  (e.g..  photos  and  maos)  can  be 
mounted  on  the  digitizing  table  and  put  into  correspondence  with  the 
digital  data. 

Correspondence  with  the  mao  data  base  is  established  interactiyely 
by  designating  control  points  in  the  uncalibrated  image  or  mao. 
Calibration  of  digitized  images  could  be  performed  automatically,  with 
the  system  selecting  potentially  yisible  landmarks  (using  navigational 
data  associated  with  the  image)  and  then  locating  them  in  the  image 
using  scene  analysis  techniques. 

An  important  class  of  task  is  confirming  the  validity  of  existing 
knowledge.  The  completed  system  will  be  able  automatically  to  verify 
the  presence  of  certain  cartographic  features,  such  as  roads  and 
waterways,  and  also  to  monitor  the  status  of  some  typical  dynamic 
situations,  such  as  ships  berthed  in  harbor  or  boxcars  stored  in  a 
classification  yard. 

Another  class  of  task  is  concerned  with  identifying  new  features 
and  adding  them  to  the  data  base.  This  can  be  accomplished  using 
interactive  aids  for  mensuration  and  tracing.  Cartographic  features 
from  printed  documents  must  be  digitized  manually  by  tracing  them  on  the 
tablet.  Elevations  can  be  obtained  automatically  if  they  are  available 
in  the  terrain  data  base.  Numerical  annotations,  such  as  the  heights  of 


HAWKEYE  SYSTEM  MODULES 


I 

I 


cu 

E 

CO 

00 

CO 

0) 

3 

c 

ifi 

CO 

o. 

0) 

4J 

•IH 

o 

4J 

CO 

CO 

u 

■u 

CO 

a 

E 

3 

cd 

o 

/— s 

T3 

(U 

CL 

a 

s: 

CO 

c 

c 

OJ 

O 

•H 

a. 

3 

rH 

(0 

o 

0) 

3 

e 

C 

CO 

u 

•H 

(L) 

c 

c 

•> 

0) 

>-4 

o 

4J 

5 

0) 

<u 

OJ 

o 

•H 

(0 

6 

3 

<v 

(0 

4J 

o 

4= 

rH 

•H 

CL 

4-» 

•r-) 

M 

d) 

c 

■u 

(U 

a 

u 

#* 

CO 

X» 

0) 

43 

»H 

JO 

CO 

00 

•H 

03 

00 

E 

00 

3 

O 

CO 

a 

4-» 

43 

4-> 

E 

c 

c 

O 

CO 

to 

•iH 

CO 

CO 

4J 

3 

U 

*H 

♦H 

U 

43 

u 

OO 

c 

•H 

3 O 

E u-i  C 
C -H 
>»  -H  O 

a 


•H  O TJ 
TJ  -H 


T3  -W 

C E ^ 

CQ  00 


U 

CO 

CO 

CO 

43 

CO 

Mi 

Ml 

o 

•H 

CO 

CO 

•H 

c 

C 

CL 

X 

3 

3 

cO 

<M 

d) 

o 

3 

iH 

Vj 

o 

O 

cO 

CM 

(U 

O 

O 

M 

(0 

43 

•H 

Ml 

a 

O 

•H 

o 

•H 

U 

O 

TJ 

43 

43 

0) 

c 

c 

CO 

CO 

Ml 

Ml 

Ml 

Ml 

OO 

c 

CO 

cO 

E 

CO 

*> 

3 

C 

‘H 

u 

CO 

O 

fO 

o 

CO 

H 

to 

rC 

00 

g 

d) 

•H 

CJ 

•iH 

CJ 

Ml 

c 

00 

00 

a 

Ml 

Ml 

c 

g 

CO 

Ml 

CL 

a 

•H 

u 

o 

'O 

c 

c 

00 

•H 

00 

o 

•iH 

(U 

O 

c 

C 

CO 

•H 

c 

•H 

CO 

•H 

CM 

4-1 

c 

u 

c 

u 

> 

M 

Mi 

■Ml 

3 

CM 

3 

o 

Mi 

CO 

u 

Ml 

u 

o 

O 

0) 

0 

•H 

CL 

o 

E 

o 

E 

•fH 

d) 

CO 

0) 

iH 

Ml 

CM 

c 

U 

CO 

E 

E 

CM 

C 

'O 

5 

c 

c 

'w' 

•H 

o 

o 

0 

d) 

Ml 

CO 

c 

0 

(3 

o 

o 

•H 

u 

CO 

CO 

o 

o 

00 

{3 

Ml 

•H 

Ml 

•H 

CU 

o 

O 

O 

a 

CM 

cO 

C 

O 

c 

•H 

H 

c 

c 

O 

•H 

CO 

Ml 

c 

rH 

u 

•H 

•H 

>> 

d) 

M 

CO 

iH 

CO 

Mi 

Mi 

o 

•rH 

E 

c 

13 

CO 

♦H 

•H 

o 

Ml 

iH 

Ml 

iH 

u 

TJ 

43 

43 

CO 

CO 

H 

u 

o 

CO 

CJ 

U 

CO 

CO 

CO 

CO 

CO 

o 

H 

c 

a 

c 

CJ 

c 

Ml 

CO 

TJ 

a 

d) 

*H 

T3 

CO 

Ml 

CJ 

N 

CJ 

Ml 

O 

*H 

o 

CO 

o 

•H 

•H 

CO 

u 

Ui 

E 

C 

C 

CM 

TJ 

c 

•H 

•iH 

•H 

•H 

•H 

•tH 

•H 

E 

E 

u 

H 

cO 

u 

3 

CO 

(1) 

cu 

43 

Ml 

43 

CO 

d) 

Ml 

OO 

Ml 

3 

o 

C 

g 

g 

a 

M 

CO 

a 

•iH 

cx 

O 

(D 

00 

CO 

o 

CO 

(U 

d) 

CO 

^3 

iH 

43 

iH 

E 

rH 

CO 

d) 

CO 

OO 

CO 

CU 

C 

cO 

d) 

a 

d) 

Ml 

Ml 

C 

cO 

•H 

U 

CO 

O 

O 

d) 

43 

u 

u 

*H 

U 

d) 

0) 

E 

3 

0 

3 

d) 

CU 

0) 

O 

CO 

CO 

S 

o 

CJ 

33 

C/3 

(X 

o 

Q 

O 

03 

O 

M 

CD* 

H 

O' 

P 

Q 

X 

03 

OcJ 

33 

QJ 

CO 

cO 

43 

d) 

OO 

(U 

'O 

U 

(U 

c 

d) 

TJ 

0) 

o 

c 

iH 

c 

o 

3 

(1) 

cu 

CO 

CO 

CO 

0 

Mi 

a 

cO 

d> 

Ml 

X d) 

0) 

cO 

p 

u 

CO 

o 

^4 

E 

u 

tH 

cO 

43 

CO 

o 

rH 

CM 

CO 

TJ 

Mi 

o 

cO 

u 

Mi 

<u 

CO 

tH 

d) 

N 

T3 

d> 

U 

Mi 

CO 

•H 

00 

CU 

c 

O 

iH 

C 

CO 

a 

•H 

CO 

•H 

cO 

•H 

E 

CO 

tH 

43 

M 

CO 

•H 

U 

CU 

a 

(1) 

M 

di 

CO 

CO 

c 

U 

CU 

CO 

CO 

•H 

u 

d> 

d) 

iS 

cO 

P 

o 

o 

o 

H 

X 

H 

N 


structures,  are  entered  via  a keyboard.  When  the  source  of  information 
is  a digitized  image,  the  user  can  invoke  the  interactive  tracing  aids 
that  reauire  from  the  user  only  a crude  guideline.  Dimensional 
measurements  can  be  obtained  interactively  using  various  mensuration 
aids. 

The  user  can  obtain  information  from  the  data  base  to  aid  him  in 
his  interpretations.  The  system  can  answer  simple  queries,  such 
as:  Show  me  Pier  14.  What  is  this  building?,  or  How  high  is  that 
mountain?.  These  queries  are  entered  in  English  via  a keyboard  and  via 
a display  cursor.  In  this  way.  the  user  can  determine  the  identity  of 
known  structures,  or  whether  a given  structure  is  new,  obtain 
intelligence  background  information,  indicate  features  to  the  system,  or 
have  the  system  indicate  features  (such  as  all  the  roads  in  an  area). 

The  analyst,  similarly,  could  query  the  data  base,  which  thus  serves  as 
a channel  of  communication  between  the  analyst  and  the  interpreter. 

The  system  can  be  programmed  to  answer  more  complex  queries,  such 
as:  How  many  ships  were  in  Oakland  harbor  yesterday?,  by  retrieving  the 
relevant  image  from  the  library  and  invoking  an  appropriate  task 
specialist.  The  system  has  the  ability  to  accept  such  requests  entered 
remotely  (say.  by  intelligence  analysts)  and  execute  them  automatically 
if  it  understands,  or  else  relay  them  to  the  user  (the  photo 
interpreter)  for  interactive  execution. 

The  questions  that  can  now  be  handled  automatically  are  limited  by 
the  present  small  size  of  the  data  base  and  the  available  specialist 
routines,  which  automate  tasks  carefully  chosen  to  exploit  existing 
primitive  low-level  vision  capabilities.  However,  demonstrated 
capabilities  do  show  the  potential  of  bringing  image  understanding  and 
artificial  intelligence  approaches  to  bear  on  problems  in  cartography 
and  photo  interpretation. 


II  THE  HAWKE YE  SYSTEM 


A . System  Qrg_ani m t ion 

The  Hawkeye  system  is  necessarily  a lar^e  complex  of  programs  that 
make  considerable  demands  upon  the  supporting  hardware  and  software.  It 
contains  sufficient  program  and  data  to  fill  the  available  user  address 
space  of  256k  words  on  a DEC  PDP-10  several  times  over.  Very  efficient 
code  is  required  for  low-level  mass  processing  of  image  arrays , but 
flexible  symbol  manipulation  is  required  for  the  higher  levels  of 
processing.  There  must  be  efficient  communication  between  the  various 
levels  and  subsystems.  The  system  is  highly  interactive,  using  several 
very  different  devices,  with  parts  that  must  be  sharable  among  several 
users,  while  other  parts  must  be  user-specific. 

Unfortunately,  the  requirements  are  not  met  by  currently  available 
combinations  of  hardware  and  software.  In  particular,  no  single 
existing  programming  language  is  truly  adequate  for  advanced  image 
understanding  work;  hence  it  is  necessary  to  program  in  a mixture  of 
languages.  Because  of  this,  and  because  of  its  total  size,  the  syst«n 
must  be  partitioned  into  subprograms,  or  modules.  We  have  attempted  to 
modularize  the  Hawkeve  system  in  a way  that  retains  as  much  flexibility 
as  possible,  while  sacrificing  as  little  power  as  possible.  The  modules 
correspond  to  parts  of  the  system  that  perform  a distinct  set  of 
operations,  are  programmed  in  a single  language,  and  communicate  with 
the  user  mainly  via  one  device.  Some  modules  (e.g..  the  data  base) 
could  also  be  shared  bv  several  users. 

The  Hawkeye  system  currently  consists  of  the  modules  outlined  in 
Table  1.  The  various  modules  and  their  capabilities  are  discussed  in 
more  detail  later.  In  this  section,  we  address  the  problem  of  overall 
organization  of  the  system. 


7 


The  modules  of  the  Hawkeye  system  are  imolemented  as  independent 
processes  (forks),  each  written  in  an  approoriate  lan^uatte  (INTERLISP. 
SAIL.  FORTRAN,  or  MACRO)  with  its  own  data  structures.  Processes 
interact  by  sending  and  receivinR  interprocess  messages,  and  are 
organized  as  in  (FiKure  2).  In  an  early  version  of  the  system, 
interaction  was  purelv  vertical,  between  the  user  interface  and  a 
particular  subservient  module.  We  found,  however,  that  such  a scheme 
was  not  flexible  enough. 

Each  process  performs  a specific  set  of  functions,  either  at  the 
request  of  other  Hawkeye  processes  (e.g..  the  display  handler)  or  in  the 
direct  interests  of  the  human  user  (e.g..  a top-level  task  specialist, 
such  as  the  railvard  monitor).  The  former  processes  are  "server 
processes."  whereas  the  latter  may  be  classified  as  "user  processes" 
(although  occasionally  some  processes  might  behave  as  both  user  and 
server).  A server  process  is  associated  with  each  external  connection 
(i.e..  device  or  data  base).  Each  server  presents  a standard  interface 
to  the  rest  of  the  system.  Thus,  knowledge  of  the  idiosyncrasies  of  a 
particular  device  or  data  base  is  required  only  within  the  process 
dedicated  to  it. 

The  user  communicates  with  the  system  primarily  via  the  user 
interface  module  (written  in  INTERLISP) , which  then  calls  upon 
appropriate  server  modules  to  carry  out  his  request.  The  user  interface 
module  is  also  responsible  for  initializine  the  server  modules  when  the 
system  is  first  started. 

This  organization  of  the  system  is  very  convenient.  It  permits 
easy  extension  of  the  system  bv  adding  modules,  and  even  the 
incorporation  of  independent  programs  within  the  Hawkeye  framework.  It 
also  permits  using  less  than  the  full  system  (e.g..  iust  the  user 
interface  and  data  base)  or  sharing  of  modules  among  users,  if  desired. 
Such  flexibility  is  valuable  both  for  building  and  using  complex 
systems. 

The  following  sections  describe  intermodule  communication  and  the 
supporting  modules  in  more  detail. 


8 


J 


If 

t - 


1 


A conventional  self-contained  DroRram  of  modest  size  uses  two 
orincioal  routes  for  passing  information  among  its  components:  via 
arguments  to  subroutines,  and  via  global  variables.  However,  a large 
svstem  that  consists  of  manv  modules  that  are  essentiallv  independent 
programs  cannot  use  these  methods  for  intermodule  communication,  since 
modules  occuov  different  address  spaces.  In  this  case,  modules  must 
communicate  either  via  shared  files  or  vdiatever  other  means  are 

available  in  the  operating  svstem  for  this  purpose.  i 

The  TENEX  and  TOPS-20  operating  svstems  provide  several  methods  of  , 

communication  between  programs:  files,  pseudoteletyoes.  shared  pages, 
and  the  Inter-Process  Communication  Facility  (TOPS-20  onlv) . Files  have 
the  disadvantage  of  being  primarily  secuential.  and  of  entailing 
significant  overheads  for  opening  and  closing.  Pseudoteletyoes  are 

strictly  sequential,  character-passing  channels.  They  enable  the  two  ' 

modules  concerned  to  treat  communication  like  terminal  I/O.  but  require 
parsing  routines  at  each  end  to  translate  from  text  format  to  internal 
format . 

Our  first  version  of  the  Hawkeye  system  was  implemented  under  TENEX 
and  made  use  of  the  page-sharine  facility.  Page  sharing  is  really 
sharing  a block  of  memory,  or  sharing  global  variables.  An  additional 
facilitv  is  needed  to  permit  control  to  be  passed  between  modules:  In 
this  case,  the  fork  (sublob)  machinerv  was  used.  The  system 
organization  was  hierarchical,  with  a top-leyel  module  and  a set  of 
inferior  modules  (forks).  Inferior  modules  were  treated  much  like 
subroutines,  with  arguments  being  passed  via  shared  pages  and  control 

being  handed  temporarilv  to  the  inferior  module,  which  eventuallv  , 

returned  it  to  the  top  level  module. 

VIhen  the  system  became  more  complex,  the  hierarchical  organization 
became  insufficient:  It  was  necessarv  for  control  and  information  to  be 
passed  among  the  inferior  modules.  We  contemplated  extending  the 
shared-page  scheme,  but  there  were  a number  of  problems:  Allocating  one 
shared  page  for  each  pair  of  modules  is  expensive  and  makes  syston 

extension  difficult,  while  allocating  one  (or  more)  pages  common  to  all  j 

10 

I 

J 


modules  leads  to  interlock  and  scheduline  difficulties.  Sharing  pages 
does  not  readily  permit  communication  between  concurrent  processes  — 
that  reauires  some  form  of  oueueing  of  messages. 

In  the  current  version  of  Hawkeye,  interprocess  communication  is 
implemented  using  the  Inter-Process  Communication  Facility  (IPCF)  of  the 
TOPS-20  operating  system.  This  facility  provides  significantly  better 
interaction  than  the  pseudo-teletype  and  shared-nage  facilities  to  which 
we  were  limited  under  TENEX.  IPCF  enables  processes  (including  forks 
and  lobs)  to  send  and  receive  messages  in  the  form  of  packets,  up  to  a 
page  (512  words)  in  length.  Messages  are  copied  from  the  sender's 
address  space  and  placed  in  an  input  queue  in  system  soace.  The  packet 
remains  in  the  queue  until  the  receiver  requests  it.  at  which  time  it  is 
copied  to  the  receiver's  address  space. 

Messages  consist  primarily  of  requests  and  responses  with  the 
format : 


Source  ID  ; Destination  ID  : Message  ID  : Message  Data 


The  source  and  destination  IDs  are  the  unique  process  identifiers 
assigned  to  the  associated  processes  at  the  time  they  were  created. 

Each  module  may  contain  several  processes  that  are  independent  or  that 
share  memory,  subroutines,  or  other  facilities.  In  Hawkeve.  the  message 
ID  convevs  the  type  of  processing  and  information  required  of  the 
receiver,  and  thus  implicitly  how  the  message  should  be  unpacked  (i.e.. 
as  integers,  reals,  characters,  or  some  mixture). 

Once  a request  is  posted,  the  requester  (sender)  usually  waits 
until  the  requestee  (receiver)  has  finished  processing  the  request.  The 
requestee.  on  the  other  hand,  is  typically  a server  and  operates  by 
waiting  for  a request,  processing  it  to  completion,  sending  a response, 
waiting  for  the  next  request,  and  so  on.  Servers  are  not  interrupted  bv 
incoming  requests  while  processing  a previous  request.  However,  they 
mav  themselves  send  requests  to  other  servers  and  await  replies  before 
proceeding.  For  this  purpose,  each  process  has  two  channels  of 


communication,  each  with  its  own  ID:  One  channel  is  for  receiving 
requests  and  sending  responses,  and  the  other  is  for  sendine  its  own 
reauests  and  receivin*  answers.  A process  may  choose  whether  it  will 
respond  to  requests  while  awaitinR  an  answer,  but  it  had  better  do  so  if 
a recursive  situation  can  arise.  Processes  are  used  in  Hawkeve  not  so 
much  to  achieve  oarallelism.  but  to  acouire  a larger  address  soace  and 
to  integrate  multiple  languages. 

Interface  routines  are  needed  to  match  the  basic  message-passing 
machinery  to  the  various  orogramming  languages  used  in  imolementing 
Hawkeve  modules.  Each  module  must  know  what  computations  can  be 
performed  bv  other  modules,  and  the  data  format  of  the  message.  The 
send  interfacing  routines  construct  a message,  packing  the  required 
information,  possibly  after  type-conversion.  The  receive  interfacing 
routines  unpack  the  information  and  call  an  appropriate  subroutine.  For 
example.  LISP  interfacing  routines  will  accept  arbitrary  ob.lects  (e.g.. 
lists,  arrays,  numbers)  for  transmission,  convert  them  to  the  form 
expected  bv  the  receiver,  pack,  and  transmit  them.  One  such  routine 
accepts  anv  LISP  S-expression  and  transmits  it  to  any  other  LISP 
process,  which  then  evaluates  it  and  returns  the  result.  The  user 
interface  module,  for  example,  can  construct  S-expressions  from  English 
input  and  transmit  them  to  the  data  base  module  for  evaluation. 

It  appears  that  the  Inter-Process  Communication  Facility,  with  our 
extensions  to  it.  is  powerful,  flexible,  and  convenient.  It  has  the 
additional  advantage  that  it  permits  communication  between  processes  in 
different  time-sharing  lobs.  Thus,  it  is  possible  for  users  to  share 
facilities,  (such  as  the  data  base)  and  tc  ommunicate  among  themselves. 
For  example,  it  would  be  possible  for  two  ohoto  interpreters  using  the 
IPCF  machinery  to  exchange  example  images  and  even  to  converse  about 
them. 


12 


B.  Display  Server 


The  display  server  is  the  channel  throuKh  which  all  graphical 
information  communicated  via  the  display  must  pass.  Not  onlv  does  this 
restriction  avoid  duplication  of  displav-handlinit  code  in  other  modules, 
but  it  centralizes  knowledKe  of  the  state  of  the  display  screen.  The 
display  server  can  allocate  subsets  of  the  display  capabilities  to 
requestinK  processes  and  keep  track  of  what  is  yisible  to  the  user: 
Bookkeepintc  of  this  sort  is  essential,  for  example,  when  the  user  is 
attemptine  to  point  at  a feature  displayed  on  the  screen. 

The  basic  capabilities  of  the  display  hardware  are  as  follows:  The 
screen  area  is  256x256  pixels,  and  «rey-scale  images  can  be  displayed 
with  four  bits  of  brightness  resolution  in  red.  sreen.  or  blue.  Images 
mav  thus  be  displayed  in  color  or  monochrome.  There  are  also  two  one- 
bit  overlays  in  red  and  Kreen.  Vectors  mav  be  drawn  in  anv  combination 
of  colors,  including  the  overlays.  A trackball  is  also  available  for 
locating  a point  on  the  display  screen  or  tracing  a sequence  of  points. 

The  user,  or  a process,  can  define  display  windows.  A display 
window  is  a rectangular  portion  of  the  display  screen,  usually  indicated 
by  pointinR  with  the  track  ball  cursor,  and  some  subset  of  the  available 
colors  (Fitture  3).  Grev  scale  imastes  or  line  drawings  mav  be 
subsequently  displayed  within  the  confines  of  the  window,  without 
affectinK  the  rest  of  the  visible  display.  The  user  may.  for  example, 
use  one  window  as  an  inset  for  displavinR  an  enlarsed  portion  of  an 
imaae  (FiHure  4).  or  two  windows  for  displaying  two  different  images, 
and  mav  overlay  the  mao  on  any  of  them.  When  seyeral  windows  oyer lap, 
the  most  recently  displayed  image  overwrites  the  others,  and  the  display 
server  records  this  fact  by  keeping  a list  of  the  windows  in  use. 
ordered  by  recency  of  display  ( Figure  5).  When  the  user  points  with 
the  trackball  cursor,  the  server  searches  this  list  until  it  finds  the 
most  recently  displayed  window  that  contains  the  cursor  location.  Thus 
it  can  determine  which  point  in  which  image  was  intended.  The  window 
facility  lets  the  user  operate  as  though  a collection  of  photos  were  on 
a desk,  from  which  one  can  be  pulled  and  placed  on  top  for  examination. 


13 


or  comoarison  with  another,  while  fraements  of  the  remainder  remain 
visible. 

When  the  user  asks  for  a oicture  or  a portion  of  a picture  to  be 
displayed  in  a particular  window,  the  server  determines  whether  to 
expand  (by  replicating  pixels)  or  reduce  (by  samolinir  pixels)  the  imase 
to  fit  the  window.  When  the  image  is  thus  displayed,  the  picture  source 
file  is  associated  with  the  window,  so  that  the  user  may  subseouentlv 
have  the  window  redisplayed,  either  naming  it.  or  pointing  to  it.  The 
user  mav  also,  for  example,  have  an  enlarged  portion  of  any  window 
displayed,  pointing  with  the  cursor  to  a destination  window,  and  to  the 
feature  to  be  enlarged. 

Each  display  window  has  associated  with  it  two  coordinate  frames  in 
addition  to  the  absolute  coordinate  frame  of  the  display  hardware:  One 
is  a logical-coordinate  frame,  the  other  a world-coordinate  frame.  The 
logical  coordinates  relate  a point  in  the  window  to  the  digitized 
oicture  from  which  the  displayed  image  was  derived.  Thus,  the  logical 
coordinates  of  a feature  on  the  screen  will  be  the  same,  no  matter  vrtiere 
the  display  window  is.  or  whether  the  image  it  presents  is  an 
enlargement  or  a reduction  of  the  original.  The  logical  coordinates  are 
defined  when  the  command  is  given  to  display  a particular  oicture  in  a 
particular  window.  The  world-coordinate  frame  is  the  UTM  grid.  When  a 
digitized  oicture  has  been  calibrated  and  the  resulting  transformation 
information  filed,  the  appropriate  transform  can  be  read  in  and 
associated  with  the  display  window,  so  that  locations  of  features  in  the 
image  mav  be  given  to  the  user  in  UTM  grid  locations  with  elevations. 
Since  the  imaging  process  loses  one  dimension,  there  is  a one- 
dimensional ambiguity  in  specifying  'the  world  location  of  the  feature. 
This  ambiguity  is  removed  either  by  asking  the  user  to  specify  elevation 
or  by  retreiving  the  elevation  automatically  (when  it  is  available)  from 
the  terrain  data  base. 

The  display  server,  through  the  basic  capabilities  described  above, 
makes  it  possible  for  the  user  to  point  at  locations  in  the  image  and  be 
informed  of  their  world  location.  Alternatively,  the  Hawkeye  svstem  can 


automatically  identify  the  feature  by  inauirinpt  of  the  data  base  the 
known  features  near  that  location.  Conyersely.  since  the  relation 
between  world  and  displayed  ima«e  is  known  for  each  window,  it  is 
possible  to  determine  where  any  feature  in  the  man  data  base  should 
appear  in  the  imaRe.  and  hence  to  oyerlay  it  on  the  display.  HayinK  the 
system  respond  to  the  twin  queries:  What  road  is  that?  and  Where  is 
road  X?  can  be  yery  useful  to  a photo  interpreter. 

C.  Graphics  Tablet  Seryer 

Two  Kraphics  tablets  are  used  with  the  Hawkeye  system:  The  larRer 
one  has  a 3-  by  4-foot  digitization  area:  The  other  has  an  11-inch 
square  digitization  area.  A single  cursor  may  be  used  with  either 
tablet  to  encode  the  location  of  a point  to  0.001  inch  resolution  (with 
0.005-inch  accuracy).  The  tablets  permit  the  photo  interpreter  to  use 
documents  such  as  standard  printed  maps  or  original  photos  in 
con.iunction  with  digital  data. 

The  graphics  tablet  server  is  essentially  a restricted  version  of 
the  display  server.  The  user  can  fasten  several  documents  to  the  large 
tablet,  (e.g..  a conventional  topographic  map  and  several  aerial  photos) 
and  define  a tablet  window  (a  rectangular  region  of  tablet  surface)  for 
each.  When  the  user  indicates  a point  with  the  cursor,  the  system  can 
determine  which  window  is  indicated,  and  hence  which  document  is  being 
used.  Tablet  windows,  like  display  windows,  mav  overlap.  For 
determining  which  window  is  indicated,  however,  the  internal 
representation  orders  windows  by  recency  of  creation,  rather  than 
recency  of  use.  since  newer  documents  are  usually  fastened  on  too  of 
older  ones. 

The  system  can  indicate  a point  on  a document  by  printing  the 
tablet  coordinates  of  the  point.  The  user  may  then  move  the  cursor 
until  the  numerical  readout  on  the  tablet  control  unit  displays  the 
specified  location.  Thus,  the  system  can  use  the  tablet  to  respond  to 
such  questions  as:  Where  is  X on  this  mao?. 


18 


Each  tablet  window,  like  a disolav  window,  mav  have  loRical-  and 


world-coordinate  frame.s  associated  with  it.  The  logical  coordinate 
transformation  must  be  determined  for  each  tablet  window  individually  by 
indicating  calibration  points  with  the  cursor  and  typing  the 
corresDonding  logical  coordinates.  For  example,  a tablet  window 
enclosing  a topographic  mao  can  be  calibrated  so  that  points  indicated 
by  the  user  are  converted  directlv  to  UTM  coordinates.  Documents  that 
do  not  represent  a simple  orthogonal  pro  .lection  of  the  world,  such  as 
photos,  can  also  be  calibrated  in  terms  of  camera  parameters.  It  is 
then  Dossible.  using  hardcopv  photos,  to  obtain  world  locations  of 
indicated  features,  perform  mensuration  tasks,  and  to  determine  image 
features  corresponding  to  a given  set  of  coordinates  in  much  the  same 
wav  as  with  digitized  images  on  the  display. 

Since  we  handle  tablet  windows  and  disolav  windows  in  a consistent 
wav.  it  is  possible  to  mix  digitized  images  and  printed  documents 
freely.  For  example,  a new  digitized  image  may  be  calibrated  by 
pointing  to  corresponding  features  on  the  disolav  and  on  a topographic 
map. 

Conventional  printed  maos  can  be  used  with  Hawkeye  for  two 
purposes:  to  supplement  or  substitute  for  the  digital  mao  in  areas  for 
which  maos  in  digital  form  are  not  yet  available  and  to  input 
information  to  the  mao  data  base.  For  the  latter  purpose,  we  have 
written  programs  that  allow  the  user  to  trace  features  directly  from  the 
topographic  mao.  have  them  displayed  for  verification  and  possible 
editing,  and  then  incorporated  into  the  mao  data  base.  Elevations  must 
currentlv  be  entered  manuallv  when  tracing  from  a mao.  However,  we 
should  soon  be  able  to  obtain  this  third  dimension  automatically  from 
digital  terrain  data,  when  they  become  available. 


19 


r: 

* 

ik 


D.  Calibration  Server 

The  calibration  server  orovides  routines  to  comoute  the 
transformations  used  in  coordinate  conversions,  such  as  between  UTM  man 
coordinates  and  digitized  picture  coordinates.  It  also  provides  the 
basic  mensuration  functions,  such  as  heiKht  or  distance  determination, 
which  relv  on  the  coordinate  conversions. 

The  coordinate  systems  that  are  commonlv  used  in  Hawkeye  include 
Kraohics  tablet  coordinates.  UTM  coordinates,  relative  UTM  coordinates, 
film  plane  coordinates,  digitized  picture  coordinates,  and  displav 
coordinates.  Coordinates  are  represented  as  homogeneous  vectors,  and 
the  transformations  that  man  between  them  are  represented  as  4xH 
matrices.  This  representation  is  convenient  because  it  can  be  used  to 
represent  both  linear  and  perspective  transformations. 

The  basic  calibration  routine  computes  a best  estimate  of  the 

transformation,  in  a least-sauares  sense,  between  two  lists  of 

correspond inR  coordinates.  This  optimization  routine  is  used  in 

coniunction  with  a variety  of  other  modules  concerned  with  determininR 

corresDondinjt  calibration  points  in  different  coordinate  systems.  To 

facilitate  experimentation,  an  interactive  routine  is  available  that 

allows  the  user  to  specify  the  source  of  each  list  of  points  (Rranhics 

tablet,  display,  or  file)  and  the  type  of  transformation  (digitization. 

persoective.  or  collineation) . and  (if  appropriate)  to  indicate  control 

points  on  the  desipinated  input  devices.  The  interactive  routine  then 
r 

calls  the  optimizer  to  compute  an  estimate  of  the  transformation  and 
displays  the  residuals.  If  the  user  is  not  satisfied  with  the 
precision,  the  two  lists  of  points  can  be  edited  and  the  computation 
tried  ascain. 

The  mensuration  primitives  allow  the  user  to  measure  distances  in 
terms  of  any  of  the  above  coordinate  systems.  e.K..  the  length  of  a ship 
I . or  the  height  of  buildinR.  The  routines  are  Reneral  enouRh  so  that  the 

user  can  indicate  each  point  in  a different  window  (Rraohics  tablet  or 
display).  For  example,  it  is  possible  to  measure  the  distance  between  a 
point  on  a mao  currently  on  the  graphics  tablet  and  a point  in  a 
digitized  picture  currently  being  displayed. 


20 


E,  Map  Data  Base  Server 


The  data  base  server  is  the  means  of  access  to  the  mao  data  base 
for  those  svstem  modules  that  are  not  reauired  to  have  detailed 
knowledKe  of  its  structure  or  implementation.  This  server  contains 
access  routines  for  answerinR  a variety  of  standard  queries  about 
specific  data  and  the  general  format  of  the  data  base;  for  example:  What 
is  at  (x.v.z)?  What  is  the  closest  road  to  (x.v.z)?  What  roads  are 
contained  in  the  area  bounded  bv  ...?  Where  is  Oakland  Mole?  What  is 
the  <attribute>  of  <ob.iect>?  What  is  knovm  about  <obiect>? 

The  mao  data  base  contains  three-dimensional  descriptions  of 
cartographic  and  cultural  features,  including  coastlines.  ma1or  roads, 
lakes,  bridges,  airfield  runways,  oil  storage  tanks,  and  harbor  lights. 
In  addition,  the  mao  contains  a partial  taxonomv  of  world  entities,  with 
releyant  general  semantics,  information  about  available  imagery,  and 
descriptions  of  data  structures  used  by  the  svstem.  The  information 
about  imagery  includes  file  name,  calibration  data,  and  geographic  area 
coyered  and  can  be  used  in  selecting  appropriate  pictures  for  specific 
tasks.  Ihe  descriptions  of  the  data  structures  enable  the  system 
automatically  to  construct  (instantiate)  new  entities  of  the  correct 
structure  for  inclusion  in  the  data  base. 

The  mao  data  base  is  a disk-based  semantic  net  data  structure  that 
can  contain  realistic  quantities  of  data  represented  in  a way  that 
permits  efficient  access.  Entities  are  represented  by  LISP  atoms  (e.g.. 
English  words);  information  associated  with  the  entity  is  stored  in  a 
property  list  format.  Relationships  to  other  entities  are  also  stored 
on  the  property  lists,  thus  establishing  a network  structure  in  the  data 
base.  When  information  concerning  a particular  entity  is  sought,  the 
property  list  is  retrieyed  from  disk  and  established  in  core.  A 
"paging"  scheme  limits  the  amount  of  data  in  core  (to,  say.  1000 
entities)  and  writes  entities  back  out  to  disk,  if  necessary,  the  least 
recently  used  ones  first  [2].  Retrieyal  of  the  information  is  by  means 
of  a hash  table  on  disk,  which  means  that  access  time  is  constant  and 
independent  of  data  base  size.  The  geometric  data  are  indexed  (the 


r ^ ■ 

I 

r 

> 

• % 

index  structure  is  part  of  the  data  base)  via  K-D  trees  [4].  one  tree 
for  each  class  of  entity  sought,  to  enable  fast  retrieval  of  information 
' relevant  to  a particular  area.  Further  details  of  data  base 

implementation  are  to  be  found  in  Appendix  A.  and  examples  of  semantic 
organization  are  given  in  [2]. 

! We  are  setting  up  a mao  of  the  San  Francisco  Bav  Area,  containing 

■ ma1or  features,  coastlines,  bridges,  and  highways.  Figure  6 is  a 

portion  of  a U.S.  Geological  Survey  (USGS)  mao  of  the  area;  Figure  7 
show.s  the  portion  of  the  mao  currently  in  the  data  base.  Figure  8 
shows  part  of  the  mao  at  higher  resolution.  The  mao  consists  of  about 
4000  points.  Plus  various  semantic  relationships,  totaling  about  three- 
quarters  of  a million  bytes  of  disk  storage.  (Access  to  a particular 
item  of  information  takes  less  than  1 ms  if  it  is  paged  in.  and  15  to  30 
ms  plus  disc  access  time  if  it  must  be  read  in).  The  mao  information  is 
entered  bv  manually  tracing  features  on  a USGS  mao  using  a digitizing 
■ table:  Mao  data  in  digital  form  are  not  readily  available;  because  the 

problem  of  digitizing  printed  maos  has  rather  different  constraints  from 
the  problem  of  making  maos  from  photographs,  we  could  not  exploit  our 
guided  tracing  techniques. 

F.  Terrain  Data  Base  Server 

Information  about  the  detailed  topography  of  the  San  Francisco  Bay 
Area  is  available  as  an  array  of  elevations  over  a grid  with  200-ft 
spacing  between  samples.  The  amount  of  data  involved  is  very  large 
(several  million  bvtes  in  its  original  format);  even  so.  it  is  necessary 
to  interpolate  to  estimate  elevation  for  an  arbitrary  location. 

The  terrain-data-base  server  performs  two  main  functions.  The  most 
basic  function  simply  returns  the  eleyation  of  a location  specified  in 
UTM  grid  coordinates.  To  do  so  requires  performing  a local 
interpolation  (currently  bilinear). 

The  second  function  is,  given  two  locations  in  three-dimensional 
coordinates  (a  viewpoint  and  a world  point)  that  specify  a line  of 
sight,  to  determine  the  point  on  the  ground  that  will  be  seen  looking 


L 


22 


FIGURE  7 DISPLAY  OF  THE  DIGITAL  MAP  DATA  BASE 


alonK  that  line  of  sistht.  This  function  is  imoortant.  because  it 
enables  estimation  of  the  third  dimension  that  is  lost  when  an  image  is 
formed:  Each  ooint  in  a calibrated  image  specifies  onlv  a line  of  siaht. 
and  the  terrain  information  resolves  the  imolied  ambiguity. 

The  terrain-data-base  server  can  be  used  whenever  locations  or 
distances  are  estimated  from  an  image.  For  example,  when  measuring  the 
height  of  a vertical  structure  bv  pointing  to  its  top  and  bottom,  the 
terrain  data  specify  the  location  of  the  bottom,  and  thereby  set  the 
scale  of  the  structure. 

The  utility  of  terrain  data  is  limited  bv  their  accuracy.  We 
obtained  our  Bav  Area  data  from  the  National  Cartographic  Information 
Center.  Reston.  The  data  base  was  originally  created  bv  tracing 
contours  on  a 1:250,000  scale  mao  and  then  interpolating  between 
contours  to  estimate  elevations  at  grid  points.  There  are  thus  three 
classes  of  errors:  those  existing  in  the  original  topographic  mao,  those 
arising  during  tracing,  and  those  resulting  from  interpolation.  We  have 
already  become  aware  of  peculiarities  in  the  data  resulting  from  the 
interpolation  algorithm,  but  we  are  not  yet  able  to  assess  their 
seriousness.  We  have,  however,  discovered  gross  errors  apparently  owing 
to  faulty  copying  of  the  digital  data  before  distribution.  For  these 
reasons,  the  terrain  Information  used  in  the  Hawkeve  system  is 
considerably  less  accurate  than  it  need  be.  An  operational  photo 
interpretation  certainly  could  have  access  to  more  precise  and  more 
detailed  information. 

G.  User  Interface 

When  a system  becomes  large  and  complex,  it  may  become 
difficult  for  the  user  to  remember  the  command  format.  Qualifications, 
and  conseauences  for  all  its  components,  particularly  if  some  are 
infrequently  used.  The  situation  is  even  worse  if  the  system  mav  be 
used  by  those  who  only  encounter  it  occasionally,  rather  than  spending 
their  working  day  with  it.  During  system  development,  when  several 

26 


peoDle  are  exDerimentinR  vd.th  and  mcxlifyinK  various  oleces.  no 
Individual  has  complete  knowledsce  of  all  asoects  of  the  system. 

To  facilitate  interaction  with  the  Hawkeve  system,  we  have 
Implemented  a user  interface  module  that  provides  two  main  facilities: 
natural  lanRua^e  communication  and  a help  facllltv.  Natural  lan^^uaRe 
(more  accuratelv.  a subset  of  it)  gives  the  user  considerable 
flexibility  in  structuring  queries  or  commands  so  that  he  or  she  is  not 
restricted  bv  any  difficulty  in  recalling  a formal  svntax.  The  help 
facility  enables  the  user  to  find  out  the  range  of  capabilities  and  data 
possessed  bv  the  svstem.  The  user  interface  also  Includes  task  queueing 
and  reporting  facilities  to  simulate  a photo  interpreting  environment, 
such  as  that  provided  by  the  PACEH  svstem. 

2.  Natural  Language  Communication 

The  user  interface  is  implemented  with  LIFER,  a proorietarv 
language  definition  and  parsing  svstem  written  in  LISP  and  developed  at 
SRI  by  Hendrix  [5].  LIFER  interfaces  have  been  designed  for  several 
large  AI  programs,  including  the  ACCAT  test  bed  supported  bv  ARPA  [6]. 
LIFER  makes  it  particularly  easy  to  achieve  dialogs  .t  a limited 
domain,  facilitated  by  such  features  as  acceptance  of  elliptical  input 
and  the  capability  for  the  user  to  expand  the  grammar  incrementally  as 
deficiencies  are  discovered. 

LIFER  uses  an  augmented  transition  net  grammar  whose  symbols 
correspond  to  semantic  as  well  as  syntactic  entities.  The  language 
designer  provides  a collection  of  rules,  each  of  which  consists  of  a 
pattern  and  a response  expression.  For  example: 

PATTERN. DEFINE ( (WHAT  IS  THE  DATE)  (PRINT  (DATE))  ) 
will  cause  the  expression  (PRINT  (DATE))  to  be  evaluated  when  the  user 
types:  What  is  the  date?. 

The  designer  can  also  define  patterns  that  use  metasymbols  to 
stand  for  subexpressions.  For  example: 

PATTERN. DEFINE(  (WHERE  IS  <PLACE>) 

(PRINT  (LOCATION  <PLACE»)  ) 


27 


will  print  the  location  of  whatever  is  bound  to  the  metasymbol  <PLACE> 
as  a result  of  the  oattern  matching  the  input.  If  the  user  tyoes:  Where 
is  Oakland?,  and  <PLACE>  matches  "Oakland."  then  the  expression 
(PRINT  (LOCATION  ’OAKLAND))  is  evaluated. 

A metasymbol  can  be  defined  by  one.  or  more,  of  an  explicit 
set,  a predicate,  or  a subirrammar.  A set  can  be  defined  by.  for 
example. 

MAKE.SET(  <PLACE>  (SRI  OAKLAND  ALAMEDA)  ). 
which  makes  SRI.  OAKLAND,  and  ALAMEDA  acceptable  places. 

A predicate  is  similarly  defined: 

MAKE.PREDICATE(  <PLACE>  PLACE. TESTER  ). 
where  PLACE. TESTER  is  the  name  of  a function  that  can  decide  whether  to 
accept  a f^iven  word  as  a place  name. 

SubKrammars  are  defined  in  terms  of  patterns  and  response 
expressions.  U3in«  PATTERN. DEFINE: 

PATTERN. DEFINE ( <PLACE> 

(THE  CITY  HALL  OF  <CITY>) 

(CITY, HALL  <CITY>)) 

The  above  definitions  would  accept:  Where  is  the  city  hall  of 
Sacramento?  and  return  the  result  of  evaluating 

(LOCATION  (CITY. HALL  ’SACRAMENTO)  ).  assuming  that  <CITY>  were  also 
defined . 

Multiple  patterns  may  be  defined  for  the  grammar  or  a 
subKrammar.  Each  oattern  definition  is  added  as  an  alternative  to  (not 
a replacement  for)  what  alreadv  exists.  Subarammars  mav  be  recursive  or 
mutuallv  recursive,  so  that  complex  phrases,  such  as:  the  nearest  road 
to  the  tallest  buildina  visible  in  the  current  photo."  may  readily  be 
handled . 

Note  that  the  subarammars  <PLACE>  and  <CITY>  in  the  examples 
above  have  semantic  as  well  as  syntactic  content.  In  a conventional 
arammar.  both  miaht  have  been  replaced  by  <NOUN.PHRASE>.  but  this  would 
have  two  sianif leant  disadvantaaes:  First,  the  arammar  would  accept  manv 
more  nonsense  sentences  because  it  would  not  be  able  to  make  fine 


semantic  distinctions  (recall  "colorless  green  ideas").  Second,  it 
would  not  be  able  to  make  different  responses  to  semantically  different 
but  syntactically  similar  input.  In  addition  to  embedding  semantics  in 
the  patterns.  LIFER  permits  semantic  checking  during  response 
eyaluation.  The  designer  can  define  response  functions  to  return  an 
error  signal  if  they  encounter  a problem.  Control  then  reyerts  back  to 
the  parser,  which  looks  for  an  alternatiye  interpretation  of  the  input. 

The  response  expression  may  be  defined  either  to  return  a 

yalue  directly  (in  which  case  semantic  checking  may  be  performed 

immediately)  or  to  return  a LISP  S-expression  to  be  eyaluated  later. 

The  latter  is  preferable  if  eyaluation  is  expensiye  or  entails  output. 

since  the  same  expression  can  be  encountered  seyeral  times  during  the 

search  for  an  acceptable  parse.  In  Hawkeye,  many  response  expressions 

can  be  eyaluated  immediately,  but  when  a data-base  auery  is  concerned  it 

is  more  efficient  to  construct  a complex  S-expression  that  is 

transmitted  to  the  data  base  seryer  and  eyaluated  as  a single  entity. 

Thus.  LIFER  might  be  used  to  translate  a complex  English  sentence,  such 

as  Where  is  the  city  hall  of  the  nearest  city  to  the  smallest  lake  in 

California?,  into  a complex  LISP  expression,  such  as 

(LOCATION 

(CITY. HALL 
(NEAREST 

(LEAST  AREA  (IN. REGION  CALIFORNIA  LAKES  ) ) 

CITIES  ) ) ) . 

LIFER  possesses  two  features  that  make  interaction  easier  for 
the  user.  The  ellipsis  feature  enables  the  user  to  type  in  only  a 
fragment  of  a sentence.  If  it  cannot  make  sense  of  the  fragment  as  an 
independent  utterance.  LIFER  tries  to  fit  the  fragment  into  the  context 
of  the  preyious  input.  For  example,  if  the  user  asks:  Where  is  the  city 
hall  of  Oakland?,  and  then  asks:  of  Concord?.  LIFER  assumes  he  meant: 
Where  is  the  city  hall  of  Concord?  and  responds  appropriately. 

The  paraphrase  feature  allows  the  user  to  make  his  own 
extensions  to  the  grammar,  without  haying  to  understand  the  mechanisms 
or  formats  inyolyed.  For  example,  he  can  type:  Let  'Where  is  Oakland 


citv  hall?*  be  a oaraphraae  of  Where  is  the  city  hall  of  Oakland? 

LIFER  analyzes  both  example  sentences,  comoares  them,  and  decides  that 
the  aDoropriate  thins  to  do  is  to  extend  the  definition  of  <PLACE>  by 
addins  the  pattern  (THE  <CITY>  CITY  HALL)  with  the  response  expression 
(CITY. HALL  <CITY>)  . The  srammar  will  subsequently  accept:  Where  is 
Concord  city  hall? 

There  are  many  commonly  used  contractions  that  people  use  in 
conversation,  and  often  in  written  communication.  For  example. 

"where's"  for  "where  is?",  and  "I'm"  for  "I  am".  Many  of  these  could  be 
readily  catered  for  with  the  existins  LIFER  machinery,  but  some  would 
cause  problems.  For  example,  "can't"  expands  to  "cannot”  and  "can" 
should  really  be  associated  with  one  phrase,  and  the  qualifier  "not" 
with  another.  To  permit  this,  we  added  a facility  to  LIFER  to  perform 
expansion  of  recognized  contractions  before  the  input  is  passed  to  the 
parser.  The  same  facility  also  deals  with  punctuation,  making  sure  that 
it  is  properly  separated  from  the  words,  discardine  ignorable 
punctuation  (such  as  question  marks  or  periods).  It  also  permits  the 
user  or  designer  to  define  synonyms,  such  as  "c.h"  for  "city  hall." 

The  amount  of  work  involved  in  designing  the  grammar  for  the 
user  interface  was  comparatively  small  — about  a man-month.  The 
grammar  produced  covers  many  different  ways  of  expressing  commands  that 
exercise  most  of  the  system’s  capabilities.  Our  experience  with  the 
system  has  shown  that  it  is  useful  to  be  able  to  give  commands  in 
English,  once  the  grammar  has  been  specified  sufficiently  well  to  cover 
the  preferred  constructs  used  by  a small  group  of  users.  A user  can  be 
relatively  free  in  expressing  himself  and  the  system  will  understand. 

We  do  not  believe,  however,  that  we  vet  have  a grammar  complete  enough 
to  understand  the  full  range  of  constructs  commonlv  used  bv  most  of  the 
population.  At  this  point,  it  is  not  clear  whether  the  grammar  could  be 
extended  sufficiently  without  requiring  significant  reorganization  of  at 
least  some  of  its  parts.  It  appears  that  in  situations  involving  naive 
users  and  restricted  domains,  an  interface  of  the  LIFER  variety  is  well 
worth  while.  For  richer  domains  with  complex  structured  data  and  a wide 


30 


ranKe  of  capabilities,  the  interface  may  become  disproportionately  larjte 
and  complex.  The  Hawkeye  system — and  its  data  base  in  particular — may 
be  a domain  that  is  already  too  rich  for  a simple  erammar  coyer inR  most 
constructs.  Natural  lan^uaKe  may  be  unnecessary  for  an  exoerienced  user 
vdio  can  spare  the  time  to  learn  an  extensiye  formal  lanKuage. 

3 • Help  Facility 

No  matter  how  exoerienced  A user — no  matter  how  exoerienced — 
dealina  with  a complex  system  will  occasionally  reouire  assistance, 
oerhaos  in  the  form  of  a reference  manual.  We  have  therefore  provided  a 
simple  help  facility  for  the  Hawkeye  system. 

Although  a user  in  need  of  help  mav  be  able  to  frame  a very 
soecific  ouestion.  there  are  usually  few  constraints  on  what  might  be 
asked.  Rather  than  trv  to  cover  the  potentially  enormous  range  of 
Questions  that  might  be  posed,  we  decided  to  use  the  English  interface 
only  to  invoke  the  help  facility.  The  facility  itself  operates  by 
presenting  the  user  with  a chunk  of  Information  and  giving  him  a small 
set  of  choices  for  getting  further  information. 

The  help  information  is  organized  hierarchically,  with  chunks 
of  information  associated  with  nodes  in  a tree  of  contexts.  A context 
also  has  a heading  or  name  that  reflects  its  position  in  the  hierarchy. 
The  topmost  node,  for  example,  has  the  title  "Hawkeve."  Each  context 
has  various  types  of  information  associated  with  it: 

• Description — a brief  (one  page  at  most)  description. 

• Subtopics — a list  of  the  more  specific  contexts  immediately 
below  this  context. 

• Instructions — how  to  accomplish  a particular  goal,  in  as 
much  detail  as  appropriate  for  this  context  level. 

• References — other  documentary  sources  of  information. 

• Parameters — (where  appropriate) 

• Relations — to  other  parts  of  the  system. 

• Peoole — for  assistance  or  complaints. 


31 


Thus  the  help  context  tree  has  several  types  of  information  associated 
with  each  context,  and  the  tree  descends  only  from  the  subtopics. 


The  top-level  node,  with  the  context  "Hawkeve."  has 
information  under  the  headinRS  "Description"  and  "Subtopics".  "Hawkeve 
: description"  has  a brief  (one  oarasiraoh)  overview  of  the  Hawkeve 
system.  "Hawkeve  : subtopics"  reveals  that  there  is  more  detailed 
information  about  hardware,  the  data  base,  application  proRrams.  the 
help  facilitv.  people,  documentation,  and  the  task  Queue.  Each  subtopic 
has  its  own  brief  description,  and  most  also  have  their  own  subtopics. 

A user  in  a particular  help  context  (correspondinR  to  a node 
on  the  information  tree),  can  enter  an  information  tvoe  (e.R.. 
description,  subtopics,  instructions)  and  have  the  appropriate  text 
printed.  For  example.  reauestinR  subtopics  information  results  in 
presentation  of  a list  of  subtopics  for  which  more  detailed  information 
exists.  The  user  can  then  tvoe  a subtopic  name  (or  its  position  number 
in  the  list)  and  descend  to  the  more  detailed  context.  Alternatively, 
the  user  can  ascend  to  the  context  immediately  above  the  current 
context,  bv  tyoinR  "Up."  or  to  the  topmost  context  of  the  tree,  by 
tvoinR  "Too."  The  user  can  cvcle  throuRh  the  information  tvoes  for  the 
current  context,  by  tyoinR  "More,"  and  cvcle  backward  bv  tvoinR  "Back." 
To  redisplay  the  text  for  the  current  context  and  tvoe.  the  user  tvoes 
"PrinL."  and  to  find  out  the  current  context  he  tvoes  "Context."  The 
latter  command  miRht  print,  for  example. 

HAWKEYE  / DATABASE  / DIGITIZED  IMAGES  : DESCRIPTION 

The  user  can  obtain  a complete  index  of  the  context  tree  bv 
tvoine  "Index."  and  will  then  see  a formatted  and  alphabetized  list  of 
the  tooics,  subtopics,  subsubtooics,  rnd  so  forth: 

HAWKEYE 

APPLICATION  PROGRAMS 
ASK 

CORRESPONDENCE 
IMAGE  ANALYSIS 
ROAD  TRACER 
SHIP  MONITOR 
TELL 

DATABASE 


DOCUMENTATION 


HARDWARE 

GRAPHICS  TABLE 
RAMTEK  COLOR  DISPLAY 
TRACK  BALL 

HELP  FACILITY 
PEOPLE 
TASK  QUEUE. 

The  user  can  .jump  cLreotlv  to  a particular  context  simplv  bv 
tvpintt  "Search"  followed  bv  the  topic  name.  He  or  she  will  then  find 
himself  in  the  most  general  context  concerning  that  topic.  A user  who 
invokes  the  help  facilitv  by  simoly  tyoing  "Hein"  is  placed  in  the 
context  that  describes  the  help  facility  itself  and  information  about 
the  basic  help  commands  is  printed. 

4.  Tasking  and  Reporting 

Tasking  and  reporting  are  important  carts  of  an  overall  photo 
interpretation  operation.  Hawkeye  has  facilities  representing  both.  A 
task  oueue  contains  reouests  expressed  in  Enclish.  as  thev  might  be 
submitted  bv  intelligence  analysts.  The  user  can  ask  the  system  whether 
there  are  anv  tasks  in  the  queue,  for  example:  Are  there  any  more 
requests?.  Even  if  the  user  does  not  pose  the  question,  the  system  will 
automatically  check  at  intervals  (currently  about  five  minutes)  and  warn 
the  user  if  the  queue  is  not  empty. 

The  user  can  ask:  What  is  the  next  task?,  and  the  next  request 
is  read  in  from  the  queue  and  printed.  The  user  presumably  then  tries 
to  comply  with  the  request.  When  finished,  the  user  can  report  findings 
directly  in  text  bv  "Report  ’The  latest  coverage  indicates  ....  ' ".as 

well  as  bv  putting  information  into  the  data  base.  He  or  she  then  tells 
Hawkeve  "I've  finished  that  .job."  or  something  similar,  and  asks  for  the 
next  task. 

If  a task  is  sufficiently  straightforward  for  Hawkeye  to 
perform  automatically,  the  user  can  ask  Hawkeye  to  "do  it 
automatically."  Hawkeve  then  uses  the  natural  language  machinery  to 


interoret  the  request  and  execute  it.  The  user  can  also  sav  "do  all  the 
tasks  automat icallv."  and  the  svstem  will  attempt  to  oerform  each  task 
in  turn,  until  the  queue  is  empty.  If  a request  is  encountered  that 
Hawkeve  is  unable  to  parse  or  execute,  it  informs  the  user,  prints  the 
request,  and  stops  so  that  the  request  may  be  fulfilled  interactively. 


The  Hawkeye  system,  at  its  present  stage  of  develooment.  contains 
many  utility  seryice  routines  that  are  fundamental  to  almost  all 


nontrivial  photo  interpretation  tasks.  The  most  important  of  these  are 
the  data  base  for  specific  and  general  knowledge,  the  image-orocessing 
routines  for  realistic  size  and  resolution  of  imagery,  the  coordinate 
transformation  and  calibration  routines  for  relating  images  to  the  real 

r 

world,  the  interprocess  communication  routines  for  organizing  large, 
resource  sharing  systems,  and  the  routines  for  communicating  with  a 
• human  user  through  multiple  media. 

Some  of  the  components  are  now  sufficiently  self-contained  or  well- 
defined  that  they  can  be  extracted  and  used  elsewhere.  The  basic  data- 
base routines,  for  example,  enable  construction  and  manipulation  of  very 
large  semantic  nets,  not  restricted  to  representing  maos.  Also,  the 
inter-process  communication  routines  are  of  general  use  in  building 
large,  complex  programs.  These  routines  will  soon  be  employed  by  other 
pro.lects  at  SRI.  and  are  freely  available  to  other  members  of  the  image- 
understanding community. 

Our  work  in  the  initial  stages  of  this  pro.lect  has  been  mainly 
exploratory:  we  believed  that  many  cartographic  and  photo  interpretive 
tasks  currently  performed  manually  could  be  at  least  partially  automated 
using  an  image  understanding  approach.  We  looked  at  examples  of  four 
common  tasks  that  are  labor  intensive,  tedious,  and  bottlenecks  to 
productivity:  tracing,  counting,  measuring,  and  monitoring.  Techniques 
were  developed  to  automate  (or  at  least  greatly  facilitiate)  each  of  the 
selected  tasks.  In  several  cases,  performance  on  our  limited  data  set 
has  been  sufficiently  good  to  warrant  more  extensive  testing  and 
evaluation.  We  are.  for  example,  planning  to  transfer  the  interactive 


35 


[ 


[ 

[ 


road  tracinR  routines  to  the  U.S.  Army  EnKineerin*  ToDoeraphic 
Laboratories. 

The  techniques  used  for  ship  monitoring  and  boxcar  counting 
demonstrated  the  use  of  a digital  mao  to  guide  automatic  interpretation 
of  imaaery:  a key  concept  was  that  relatiyely  simple  analysis  techniques 
could  be  used  once  an  appropriate  area  of  the  image  and  its  context  were 
defined.  The  usefulness  of  this  approach  is  somewhat  limited  by  the 
infeasibility  of  deyeloping  specialized  programs  for  eyery  different 
monitoring  and  counting  application.  A photo  interpreter  must  count 
boxcars,  oil  derricks,  planes,  and  many  other  articles  with  eoual 
dexterity  in  a wide  range  of  contexts.  Although  ways  can  be  found  in 
many  cases  to  automate  such  tasks,  using  the  mao  for  guidance,  clearly 
no  single  image-processing  algorithm  will  suffice.  It  is  therefore 
necessary  to  deyeloo  systems  that  can  be  rapidly  taught  to  recognize  new 
obiects  by  interactiyely  designating  examples  in  an  image,  along  the 
lines  of  [7l  and  [81.  Before  such  systems  are  possible,  however,  it 
is  first  necessary  to  to  determine  appropriate  analysis  techniques,  an'^ 
knowledge  required  to  guide  them,  by  investigating  a range  of  example 
tasks.  We  have  already  made  preliminary  investigations  of  several 
different  tasks,  but  a more  detailed  study  of  a broad  range  of 
situations  is  now  required. 

Another  serious  problem  concerns  the  reliability  of  the  system. 
Using  simple  techniques  and  limited  amounts  of  knowledge  renders  the 
system  easily  fooled;  the  ship  monitor,  for  example,  does  not  invoke 
much  knowledge  of  ships  and  thus  cannot  always  distinguish  between  a 
single  large  ship  and  a cluster  of  smaller  craft.  In  order  to  attain  a 
high  level  of  autonomy  and  robustness  over  a wide  range  of  situations, 
it  is  necessary  to  provide  the  system  with  a great  deal  of  knowledge  and 
the  routines  for  exploiting  it.  To  make  progress  in  this  area,  it  is 
necessary  to  select  some  class  of  tasks  and  a domain  that  are 
sufficiently  constrained  that  it  appears  possible  to  codify  most  of  the 
required  knowledge,  and  are  still  worth  pursuing  on  the  grounds  of 
military  relevance.  We  are  now  directing  this  oro.lect  toward 
specialization  in  such  a domain. 


36 


Hawkeve  demonstrated  the  feasibility  of  usinK  knowledge  about  maos 
and  imaeinK  to  automate  a variety  of  representative  ohoto  interpretation 
tasks.  With  this  knowledae.  adequate  performance  was  achieved  in 
straiKht forward  cases,  but  the  system  was  easily  misled  by  continnencies 
that  it  did  not  know  about,  for  example,  clouds.  Substantially  more 
world  knowledge  and  a greater  range  of  capabilities  that  use  the 
I knowledge  are  necessary  to  approach  human  oerformance.  The  Hawkeye 

j system  framework  provides  a suitable  foundation  for  integrating  the 

knowledge  and  capabilities  into  an  exoert  system.  In  the  next  stage  of 
our  research,  we  olan  to  develop  a system  with  considerable  expertise  in 
a soecialized  task  area. 

I The  task  we  have  selected  is  that  of  monitoring  traffic  on  roads. 

I More  soecifically.  given  a seouence  of  reconnaissance  images  of  a region 

• , under  surveillance,  oossiblv  taken  under  adverse  viewing  conditions,  the 

system  will  first  locate  sections  of  known  roads  visible  in  the  images, 
locate  anomalous  regions  on  the  roads  whose  size,  shape,  velocity,  and 
other  characteristics  are  consistent  with  those  of  vehicles,  and  then 
perform  a detailed  scene  analysis  in  the  vicinity  of  the  anomalies  in 

‘ order  to  identify  specific  vehicle  tyoes. 

i 

In  order  to  attain  the  level  of  performance  for  vrtiich  we  aim.  the 
system  will  require  expertise  concerning  roads  and  vehicles,  including 
knowledge  of  a wide  variety  of  situations  and  events,  such  as 
obscuration  of  roads  bv  trees  or  clouds,  the  visual  effects  of  snow  and 
rain,  the  behavior  of  roads  at  intersections,  mountains,  tunnels,  and  so 
forth.  The  system  will  also  require  knowledge  of  its  repertoire  of 
resources,  their  abilities  and  limitations,  and  how  to  evaluate  its  own 
, performance . 


t 


Apoendix  A 

THE  MAP  DATA-BASE  SERVER 


AoDendlx  A 

THE  MAP  DATA-BASE  SERVER 


The  man  data  base  is  the  primary  source  of  knowledse  for  all  task 
specialists  in  Hawkeye.  In  this  Appendix,  we  discuss  the  implementation 
details  of  the  mao  data  base.  We  beRir  with  the  orimitiye  file  storage 
system  and  then  discuss  the  semantic  data  base  and  the  oyerlaid 
Reometrical  data  base. 

1 . File/Buffer  System 

The  file  data  in  the  lU  data  base  is  stored  in  extended  atom- 
property  list  format.  That  is.  the  items  stored  in  the  data  base  are 
LISP  literal  atoms  and  their  associated  values  are  lists  in  standard 
LISP  property  list  format.  The  file  is  held  in  text  format,  since  the 
LISP  readinR  routines  are  very  efficient  at  convertinR  text  to  internal 
data  structures. 

In  order  to  retrieve  an  item  value  from  the  file,  a (hashed)  file 
address  is  computed  for  the  atomic  token,  and  the  associated  value  is 
retrieved  from  that  location  in  the  file  and  stored  in  a core-resident 
buffer,  called  DBBUFFER.  Subsequent  accesses  to  the  same  token  will 
retrieve  the  value  stored  in  DBBUFFER.  The  buffer  typically  has  a 
capacity  of  500  to  1000  tokens. 

When  DBBUFFER  is  full  and  another  slot  is  required,  the  system 
empties  the  least  recently  used  (LRU)  slot.  If  the  LRU  item  has  been 
modified,  it  is  written  onto  the  end  of  the  file  and  its  hash  entry 
undated  to  point  to  the  current  data;  if  not.  it  is  simply  discarded. 

The  new  data  element  is  inserted  into  the  now- vacant  slot,  which  is  then 

promoted  to  be  the  most  recently  used  slot.  ®nv  access  to  a data 

element  causes  its  slot  to  be  promoted  to  the  front  of  the  buffer. 

maintaininR  the  ordering  by  recency  of  use. 


3Q 


\ 


Anv  token  currently  in  DBBUFFER  has  on  its  real  prooertv  list  a 
Dointer  to  its  slot  in  the  buffer.  This  nronertv  is  removed  when  it  is 
deleted  from  the  buffer,  which  allows  the  system  to  tell  immediately 
whether  an  item  is  in  core. 

The  hashed  file  has  two  oarts,  the  first  of  which  is  the  hash  table 
consisting  of  a number  of  "buckets."  Each  bucket  contains  either  the 
byte  address  of  a piece  of  data  stored  in  the  second  part  of  the  file 
or,  if  the  bucket  is  empty,  zero.  The  data  stored  in  the  second  portion 
of  the  file  consist  of  token  names  and  their  values. 

The  process  of  retrieving  the  value  of  a target  token  uses  a 
double-hash  process  with  open  probinR.  A hash  token  is  first  created 
from  the  atom  name.  The  value  of  this  hash  token  modulo  the  first  hash 
divisor.  HASH1 .DIVISOR,  is  used  as  the  first  hash  address.  If  the 
bucket  at  that  address  is  empty,  the  target  is  not  in  the  file  and  NIL 
is  returned.  If  the  atom  stored  at  the  location  pointed  to  bv  the 
bucket  matches  the  tarset.  its  associated  value  is  returned;  otherwise, 
there  is  a collision,  and  a new  address  must  be  computed.  A second  hash 
address  is  computed  bv  adding  the  value  of  the  first  hash  address  to  the 
value  of  the  original  hash  token  modulo  HASH2. DIVISOR  (a  number  eaual  to 
the  length  of  the  hash  table),  and  the  process  is  repeated.  If  this 
also  results  in  a conflict,  the  svstem  resorts  to  open  probing  (i.e..  it 
proceeds  seauentiallv.  beginning  at  the  second  hash  address  until  it 
either  finds  the  target  item,  or  an  empty  slot). 

This  type  of  hash  addressing  does  not  allow  for  the  easy  deletion 
of  items  from  the  table.  In  addition,  since  an  entrv  is  changed  bv 
writing  the  new  value  on  the  end  of  the  file,  the  data  base  file  tends 
to  become  cluttered.  To  alleviate  these  problems,  a routine  is  provided 
for  "tidying"  the  file.  This  program  creates  a new  file  bv  going 
through  the  hash  table  of  the  old  file,  retrieving  all  items  with 
nonatomic  values,  and  inserting  them  into  the  new  file.  Therefore,  an 
item  whose  value  has  been  set  to  NIL  will  be  deleted  when  the  file  is 
cleaned  up. 


40 


J 


2.  Semantic  Data  Base 


The  semantic  data  base  is  a network  structure,  the  nodes  of  which 
are  atoms  stored  on  the  data  base  file.  The  nrimarv  orderinK  orinciple 
imposed  on  the  net  is  the  superset/ subset  relation.  The  elements  of  a 
set  are  either  enumerated  or  are  implicitly  determined  by  a special 
predicate  associated  witn  the  set.  Canonical  set  elements  are  described 
by  a "delineator."  which  specifies  the  arcs  that  describe  relationships 
between  the  elements  of  the  delineated  set  and  other  entities  in  the 
data  base. 

Two  major  classes  of  information  are  stored  in  the  semantic  data 
base:  imaee-related  data  and  mao  (or  world)  data.  Imaee  data  include 
eyerythinR  known  about  ohotoRraphs.  dicitized  imaKes.  and  calibration 
files.  The  world  information  describes  Kenerically  a set  of  cultural 
and  cartORraphic  features  and  yarious  instances  of  these  features. 

Pertinent  imaRe  data  consist  of  the  date  and  time  a photoRraph  was 
taken,  the  approximate  camera  parameters,  the  area  coyered  (actually  the 
world  coordinates  — UTM  — of  the  corners  of  the  ohotoRraph).  and.  (if 
calibrated)  the  camera  coordinates.  Calibration  files  for  imaRes  allow 
the  transformation  between  imaRe-plane  coordinates  (inches).  diRitized 
imaRe  coordinates  (pixels),  and  mao  coordinates  (UTM).  The  data  base 
records  the  file  names  of  digitized  images  (and  sublmages)  and  the  names 
of  the  calibration  files.  This  data  allows  oroRrams  for  locatinR  mao 
objects  in  imaRes  to  retrieye  the  necessary  information. 

The  Reometric  data  structure  (described  below)  allows  the  storaRe 
and  retrieyal  of  imaRes  based  on  their  coyeraRe.  Thus,  the  system  can 
answer  such  Questions  as;  Vfhat  imaRes  contain  the  Oakland  Mole?  By 
consultinR  date  Information,  more  complex  queries  such  as.  What  is  the 
most  recent,  calibrated  photograph  of  the  Oakland  Mole?  can  be 
answered . 

The  mao  data  in  which  we  are  interested  haye  two  main  aspects:  the 
semantic  description  of  the  object  itself  and  a topological/geometrical 
description  of  the  location,  extent,  and  form  of  the  object.  Semantic 


data  stored  about  a road,  for  example,  would  include  features  such  as 
the  number  of  lanes,  the  tvoe  of  surface,  and  the  classification  (i.e.. 
freewav.  avenue).  Geometric/tooological  data  would  soecify  the  fact 
that  the  road  is  a linear  feature  of  a certain  length,  connecting 
various  places,  and  that  entrv  and  egress  are  usually  possible  onlv  at 
certain  defined  locations. 

The  data  base  has  information  concerning  three  topological  types: 
POINTS.  ARCS,  and  REGIONS.  A POINT  is  specified  as  the  (two-  or)  three- 
soace  location  of  the  entity;  an  ARC  is  a linear  list  of  POINTS  (and  may 
have  a specified  width);  a REGION  is  an  area  of  space  bounded  bv  a 
closed  ARC.  ARCs  mav  have  subarcs  and  REGIONS  mav  have  subregions. 

These  descriptions,  in  addition  to  being  stored  in  the  semantic  data 
structure,  are  also  stored  in  the  geometric  structure  described  below. 

These  topological  descriptions  are  used  where  appropriate.  For 
example,  ground  control  points  and  buildings  are  stored  as  POINTS. 

Roads  and  coastlines  are  stored  as  ARCs.  Cities  and  bodies  of  water  are 
stored  as  REGIONS. 

The  current  data  base  stores  generic  information  about  cities, 
roads,  railroads,  bridges,  coastlines,  rivers,  lakes,  bays,  islands, 
hills,  peninsulas,  buildings,  piers,  channels,  and  airports.  The 
complexity  of  descriptions  scans  a wide  range.  A lake,  for  example,  is 
described  simply  as  a body  of  water  surrounded  bv  land.  The  description 
of  a bridge  is  at  the  more  complex  end  of  the  spectrum.  A bridge  is 
described  as  a linear,  segmented  path  over  a collection  of  world 
entities.  The  OVERGOER  must  be  an  element  of  a restricted  set  of 
linear,  cultural  features,  such  as  roads  or  railroads,  and  the 
UNDERGOERS  must  be  from  a larger  set  of  cultural  and  cartographic 
features.  There  is  typically  one  bridge  segment  for  each  of  the 
UNDERGOERS.  The  description  also  contains  data  about  the  type  of 
supporting  structure  and  the  mean  height  of  each  segment. 

Instances  of  world  entities  in  the  current  data  base  are  drawn  from 
the  San  Francisco  Bay  Area,  and  include  yarious  cities  (including  San 
Francisco.  Oakland,  and  Berkeley)  and  bodies  of  water  (e.g..  San 


42 


r 


Francisco  Bav) . and  various  lakes  and  reservoirs.  The  cities  contain 
roads,  railroads,  and  buildinira.  Coastlines,  oiers.  and  dock  areas  are 
soecified  where  aDprooriate.  Selected  bridKes  (e.k..  the  Golden  Gate 
and  the  San  Francisco-Oakland  Bav  Bridaes)  are  described.  In  addition, 
the  data  base  stores  a large  number  of  ground  control  points  selected  to 
cover  the  areas  of  interest. 


3. 


Data  Base 


Many  entities  in  the  semantic  data  base  have  a geometric  structure. 
Roads,  for  example,  are  linear  structures  passing  through  certain 
soecified  areas.  Cities  have  the  properties  of  regions,  including  a 
defined  area  and  the  ability  to  contain  other  geometric  entities  within 
their  boundaries. 


Many  ouestions  that  could  be  asked  of  an  lU  data  base  reouire  the 
direct  use  of  these  geometric  properties.  For  example,  to  answer  the 
Query:  What  roads  are  in  Berkeley?,  requires  the  system  to  determine  the 
physical  area  coyered  by  the  city  of  Berkeley,  and  then  to  determine 
which  of  the  roads  known  to  it  pass  through  that  area. 

To  facilitate  the  answer  to  questions  such  as  this,  the  system 
organizes  semantic  entities  with  geometric  structure  into  a special 
geometric  data  base.  This  geometric  data  base  is  made  up  of  a set  of 
tree  structures  called  KDTREEs.  patterned  after  the  data  structure 
described  bv  [4].  It  is  a bin  structure  where  a bin  is  split  into 
subbins  whenever  they  get  too  full.  Bins  are  split  in  such  a way  as  to 
eyenly  distribute  their  contents. 


The  simplest  KDTREE  structure  is  one  for  storing  points  selected 
from  a two-dimensional  area.  Initially,  all  of  points  are  inserted  into 
a single  bin.  stored  at  the  root  of  the  KDTREE.  If  the  bin  is  too  large 
(i.e..  has  too  many  points),  it  is  split  into  two  subbins  by  comparing 
the  first  coordinate  (say  X)  of  each  point  with  the  median  yalue  for 
that  coordinate  over  all  points  in  the  bin.  If  the  X value  of  the  point 
being  tested  is  greater  than  the  median  X value  (called  the  pivot),  it 
is  Placed  in  the  HIGH  bin.  otherwise  it  goes  into  the  LOW  bin.  After 


43 


all  points  have  been  tested  In  this  manner,  there  will  be  two  new  bins 
of  aoproximatelv  eoual  size.  These  are  stored  as  descendants  of  the 
node  being  split,  and  the  original  bin  is  deleted.  The  pivot  is  stored 
with  the  node  as  well. 

This  process  is  repeated  on  the  new  offspring  bins,  selecting  a new 
coordinate  each  time.  Thus,  at  the  second  level,  the  median  Y value 
would  be  the  pivot,  and  at  the  third  level.  X would  be  used  again.  The 
resulting  structure  is  a well-balanced  tree  whose  tip  nodes  contain  the 
data-storage  bins. 

A new  point  is  inserted  into  the  tree  bv  starting  at  the  root  node, 
comparing  the  appropriate  coordinate  value  of  the  new  point  with  the 
pivot,  and  recursively  checking  either  the  HIGH  or  the  LOW  offspring, 
until  a tip  node  is  reached.  This  is  the  bin  which  "covers"  the  point, 
and  so  the  point  is  inserted  here.  If  after  insertion,  the  bin  is  too 
full,  it  is  split  as  described  above. 

The  system  can  determine  which  points  in  a KDTREE  are  contained  in 
a specified  rectangular  area  bv  first  discovering  which  bins  are 
overlapped  bv  the  rectangle.  These  are  found  bv  subdividing  the 
rectangle,  beginning  at  the  root  node,  and  splitting  the  rectangle  at 
the  Pivot  coordinate  (if  that  coordinate  falls  within  the  rectangle). 

The  subrectangles  are  split  again  at  the  next  level,  and  so  on  to  the 
tip  nodes.  When  a tip  node  is  reached,  each  point  in  the  bin  is  tested 
to  see  if  it  falls  within  the  sub-rectangle,  and  if  so.  will  be  returned 
when  the  process  terminates. 

As  an  example,  ground  control  points  (GCPs)  are  stored  in  a three- 
dimensional  KDTREE.  In  order  to  find  the  closest  GCP  to  a given 
location.  P.  the  system  first  finds  which  bin  P would  fall  into,  and 
selects  the  nearest  point.  PI.  in  that  bin.  Since  there  may  be  a closer 
point  in  an  ad.iacent  bin.  a square  area  centered  about  P.  with  a side 
length  of  twice  the  distance  from  P to  PI  is  created.  Anv  points  that 
fall  within  the  square  that  were  not  previously  checked  are  examined;  if 
the  nearest  point  to  P from  that  set  is  closer  than  PI . that  point  is 
returned;  otherwise.  PI  is  returned. 


44 


The  KDTREE  structure  was  extended  to  allow  storaae  and  retrieval  of 
features  that  cover  a finite  area,  such  as  ARCs  and  REGIONS.  For  these 
cases,  a bounding  rectanRular  volume  is  comouted  and  anv  bin  overlapped 
by  that  volume  is  further  tested  to  see  if  the  feature  actuallv  falls 
within  it.  If  so.  it  is  Inserted  into  the  bin. 

When  a bin  is  solit.  the  pivot  is  computed  as  the  median  of  the 
centers-of-Kravitv  of  the  subvolumes  for  features  in  that  bin.  In  this 
case,  however,  some  bound ine  volumes  will  be  solit  (such  as  the  one 
which  orovided  the  pivot),  and  will  appear  in  both  the  HIGH  and  the  LOW 
offsorinst.  It  may  even  be  the  case  that  all  of  the  volumes  will  have 
been  solit.  and  each  new  bin  has  the  same  number  of  entries  as  the 
orieinal.  If  this  happens,  the  svstem  will  attemot  to  select  a new 
coordinate  for  the  pivot.  If  all  coordinates  are  checked  and  no 
improvement  results,  the  bin  cannot  be  split. 

Using  programs  to  comoute  the  distance  from  a point  to  a linear  or 
area  feature,  the  svstem  can  determine  the  closest  such  feature  to  a 
selected  point.  This  allows  the  the  svstem  to  answer  questions  such  as: 
What  road  is  nearest  to  the  Transamerica  Pyramid?  The  Question:  What 
roads  are  in  Berkeley?  is  answered  bv  comouting  a bounding  rectangle 
for  the  area  associated  with  the  city  of  Berkeley-  and  retrieving  all 
known  roads  that  are  enclosed,  using  a similar  process  as  described 
above  for  retrieving  point  features.  These  candidates  are  further 
checked  to  ensure  they  are  actuallv  within  the  boundary  of  the  citv. 

KDTREEs  are  provided  for  most  of  the  elements  of  interest  in  the 
data  base.  Specific  trees  Include  those  for  roads.  GCPs.  images, 
coastlines,  and  buildings. 


45 


i 


REFERENCES 


1.  Barrow.  H.  G..  et  al..  "Interactive  Aids  for  CartORraphy  and  Photo 
Interpretation",  in  Artificial  Intelligence — Research  and 
Applications . Annual  Progress  Report  to  ARPA.  Contract  DAAG29-76-C- 
0012.  Stanford  Research  Institute.  Menlo  Park.  California  (June 
1976). 

2.  Barrow.  H.  G..  "Interactive  Aids  for  Cartography  and  Photo 
Interpretation".  Semiannual  Technical  Report  to  ARPA.  Contract 
DAAG29-76-C-0012.  Stanford  Research  Institute.  Menlo  Park. 
California  (November  1976). 

3.  Barrow.  H.  G..  "Interactive  Aids  for  Cartography  and  Photo 
Interpretation".  Semiannual  Technical  Renort  to  ARPA.  Contract 
DAAG29-76-C-00'57 . Stanford  Research  Institute.  Menlo  Park. 
California  (Mav  1977). 

4.  Bentley.  J.  L..  "Multidimensional  Binary  Search  Trees  Used  for 
Assoiative  Searching",  CACM,  Vol.  18.  No.  9 (Sentember  197*5). 

5.  Hendrix.  G.  G..  "The  LIFER  Manual:  A Guide  to  Build ingNatural 
Language  Interfaces."  Tech.  Note  138.  Stanford  Reseach  Institute. 
Menlo  Park.  California  (February  1977). 

6.  Sacerdoti.  E.  D..  "Language  Access  to  Distributed  Data  with  Error 
Recovery",  Proc . Fifth  IJCAI.  Vol.  1.  on.  196-202.  Cambridge. 
Massachusetts  (August  1977). 

7.  Garvey.  T.  D..  "Perceptual  Strategies  for  Purposive  Vision".  Tech. 
Note  No.  117,  Stanford  Research  Institute.  Menlo  Park.  California 
(1976). 

8.  Tenenbaum.  J.  M. . Garvey.  T.  D..  Wevl.  S.  A.,  and  Wolf,  H.  C., 
"ISIS:  An  Interactive  Facility  for  Scene  Analysis  Research".  Profit 
Second  International  Joint  Conference  on  Pattern  Recognition. 
Lvngbv-Cooenhagen.  Denmark,  oo.  123-12S,  (August  1974). 


I 

I 

i 

I 


46 


